Estimating probability values from an incomplete dataset

نویسندگان

  • Silvia Acid
  • Luis M. de Campos
  • Juan F. Huete
چکیده

An essential component in Machine Learning processes is to estimate any uncertainty measure re¯ecting the strength of the relationships between variables in a dataset. In this paper we focus on those particular situations where the dataset has incomplete entries, as most real-life datasets have. We present a new approach to tackle this problem. The basic idea is to initially estimate a set of probability intervals that will be used to complete the missing values. Then, these values are used to obtain new bounds of the expected number of entries in the dataset. The probability intervals are narrowed iteratively until convergence. We have shown that the same processes can be used to estimate both, probability intervals and probability distributions, and give conditions that guarantee that the estimator is the correct one.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank

Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...

متن کامل

A hybrid model for estimating the probability of default of corporate customers

Credit risk estimation is a key determinant for the success of financial institutions. The aim of this paper is presenting a new hybrid model for estimating the probability of default of corporate customers in a commercial bank. This hybrid model is developed as a combination of Logit model and Neural Network to benefit from the advantages of both linear and non-linear models. For model verific...

متن کامل

Marginal Analysis of A Population-Based Genetic Association Study of Quantitative Traits with Incomplete Longitudinal Data

A common study to investigate gene-environment interaction is designed to be longitudinal and population-based. Data arising from longitudinal association studies often contain missing responses. Naive analysis without taking missingness into account may produce invalid inference, especially when the missing data mechanism depends on the response process. To address this issue in the ana...

متن کامل

How Important Are Endogenous Peer Effects in Group Lending? Estimating a Static Game of Incomplete Information

We quantify the importance of endogenous peer effects in group lending programs by estimating a static game of incomplete information. Endogenous peer effects describe how one’s behavior is affected by the behavior of her peers. Using a rich dataset from a group lending program in India, our empirical analysis presents a robust finding of large peer effects. The preferred model suggests that th...

متن کامل

Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values

Estimating the mean and the covariance matrix of an incomplete dataset and filling in missing values with imputed values is generally a nonlinear problem, which must be solved iteratively. The expectation maximization (EM) algorithm for Gaussian data, an iterative method both for the estimation of mean values and covariance matrices from incomplete datasets and for the imputation of missing val...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Approx. Reasoning

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2001